years of experience.years of experience differ in their degree. Interpret the coefficients.hivscreen. What is the interpretation of this matrix?hivscreen have a preference for communicating with departments that also provide an hivscreen.hivscreen?hivscreen?hivscreen lead to the same conclusion as the ERG model for hivscreen?nutrition. What is the interpretation of this matrix?nutrition programs have a preference for communicating with departments that also provide nutrition programs.nutrition programs?nutrition program?nutrition lead to the same conclusion as the ERG model for nutrition?hivscreen and nutrition. Interpret the coefficients.hivscreen and nutrition.hivscreen and nutrition ERGM and compare it to the observed LDHS network.hivscreen and nutrition ERGM.22.B. Look through the ergm.terms page (use ?ergm.terms). For the LHDS network, identify two other network configurations that could explain the observed network. Estimate and interpret the models.
# clear the workspace.
rm(list = ls())
# First load the libraries we need.
library(sna, quietly = TRUE)
library(network, quietly = TRUE)
# Now, load the UserNetR package for the data.
library(UserNetR)
summary(lhds, print.adj = FALSE)## Network attributes:
## vertices = 1283
## directed = FALSE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## title = lhds
## total edges = 2708
## missing edges = 0
## non-missing edges = 2708
## density = 0.00329279
##
## Vertex attributes:
##
## hivscreen:
## character valued attribute
## attribute summary:
## N Y
## 461 804
##
## nutrition:
## character valued attribute
## attribute summary:
## N Y
## 326 941
##
## popmil:
## numeric valued attribute
## attribute summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00055 0.01722 0.04094 0.15865 0.12874 10.11107
##
## state:
## character valued attribute
## attribute summary:
## the 10 most common values are:
## MO OH MA IL KS NJ WI NC FL MN
## 73 72 66 63 63 62 62 57 49 49
## vertex.names:
## character valued attribute
## 1283 valid vertex names
##
## years:
## integer valued attribute
## 1283 values
##
## No edge attributes
We can see from the summary that the lhds object is of class network and has 1,283 vertices (nodes) and 2,708 edges (lines). There are 6 vertex attributes.
# Get some colors to use.
library(RColorBrewer)
mycols<-brewer.pal(11,"Spectral")
#display.brewer.pal(11,"Spectral") # I like the 3rd one for the colors.
# lets plot the data and see what it looks like.
set.seed(605)
gplot(lhds, usearrows = FALSE, edge.col = "grey80", vertex.col=mycols[3])
title("Communication network for 1,283 health departments")# define a few terms.
g <- dim(as.matrix(lhds))[1]
l <- sum(as.matrix(lhds))/2
den <- l / (g*(g-1)/2)
# generate the random graph.
set.seed(605)
random.graph <- rgraph(
g, # the number of nodes in the lhds network.
1, # we want just 1 random graph.
tprob = den, # the density of the lhds network.
mode = "graph" )
random.net <- as.network(random.graph, directed = FALSE)
# Get some colors to use.
library(RColorBrewer)
mycols<-brewer.pal(11,"Spectral")
#display.brewer.pal(11,"Spectral") # I like the 3rd and 9th one for the colors.
# Now, take a look at both graphs.
op <- par(mfrow=c(1,2), mai = c(0,0,0.7,0))
set.seed(605)
gplot(lhds, usearrows = FALSE, edge.col = "grey80", vertex.col=mycols[3])
title("Communication network for\n 1,283 health departments")
set.seed(605)
gplot(random.net, usearrows = FALSE, edge.col = "grey80", vertex.col=mycols[9])
title("Random network")par(op)Well, the random.net network looks like a big hairball. In other words, we cannot really see any subgroups or structural features other than the isolates and some peripheral members. For the lhds network, things are different:
First, there are a bunch of subgroups with about 6-12 members that are disconnected from the main component
Second, there are multiple health departments with high degree that connect otherwise disconnected departments (betweenness)
Third, most people can reach other departments meaning that, although the closeness is low, they are not disconnected (reachable)
There is actually a function in the sna package, component.dist(), that we could use to look at the components. Let’s run the command lhds.components <- component.dist(lhds) to create an object that has this information. See ?component.dist for info about the object properties.
# How many components are there?
length(unique(lhds.components$membership))## [1] 80
# How big is the largest component?
max(lhds.components$csize)## [1] 1083
# create the degree scores for each network.
lhds.deg <- degree(lhds, gmode="graph")
random.deg <- degree(random.net, gmode="graph")
op <- par(mfrow=c(2,1), mai=c(0.8,0.8,0.2,0.2))
hist(lhds.deg, breaks=20, main="", xlab="Degree (lhds)" , xlim=range(0:25), ylim=range(0:300), col=mycols[3])
hist(random.deg, breaks=20, main="", xlab="Degree (Random)", xlim=range(0:25), ylim=range(0:300), col=mycols[9])par(op)In looking at the distributions, we can see that there is much more right skew in the degree distribution for the lhds network. This indicates that there are more nodes in the lhds network with high degree than the network with a random assignment of ties. We can see this by looking at the difference in the means for the degree in each network as well.
# calculate means for degree.
mean(lhds.deg)## [1] 4.221356
mean(random.deg)## [1] 4.065472
The ERGM expresses the probability of observing a tie between nodes i and j given some terms (i.e. network configurations). The edge independence model of Erdos and Renyi (1959) uses a single term, the number of edges in the graph, to represent the probability of a tie. In the ergm package, this is the edges term.
# first load the ergm package.
library(ergm, quietly = TRUE)
# run the model.
edge.indep.lhds <- ergm(lhds ~ edges)
# check out the summary.
summary(edge.indep.lhds)##
## ==========================
## Summary of model fit
## ==========================
##
## Formula: lhds ~ edges
##
## Iterations: 8 out of 20
##
## Monte Carlo MLE Results:
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -5.71272 0.01925 0 -296.8 <1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 1140093 on 822403 degrees of freedom
## Residual Deviance: 36365 on 822402 degrees of freedom
##
## AIC: 36367 BIC: 36379 (Smaller is better.)
The value of -5.712 indicates that the density of the network is far below .5 or 50%. Recall that an edges term of 0 would represent 50% or 0.5 density.
As discussed in the Introduction to Random Graph Models (ERGMs) in R Lab, in the general formulation of the ERGM, the \(\delta\) represents the change statistic, or the change in the statistic of interest when an edge is added (i.e. \(Y_{ij}\)) goes from 0 to 1). The change statistic for the edges term is always 1, so we can think of the probability of a tie between i and j as the logit of the coefficients for the edges term. Specifically, we can use the calculation that is usual for a logistic regression to interpret the coefficient:
\(\frac{1}{1 + e^{-(\theta_1X_1)}}\)
If we plug in the value of -5.71272 that is returned by the ergm() function, we get:
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges})\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-5.71272 \times 1)\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = \frac{1}{1 + e^{-(-5.71272 \times 1)}} = 0.003\)
Does the value 0.003 look familiar?
The value of 0.003 is returned by using the command plogis().
# we could write it out.
plogis(-5.71272*1)## [1] 0.003292796
# or, we could pull from the model.
plogis(edge.indep.lhds$coef[1]*1)## edges
## 0.00329279
years of experience.# Examine the correlation.
cor(degree(lhds,gmode="graph"), lhds%v%"years", use = "complete.obs")## [1] 0.1678848
The correlation between degree and years of experience is 0.1678848, indicating that individuals who have more years of experience are more likely to communicate with more departments. In the model, we would want to include a term that takes this into account since individuals with more years will have higher degree.
years of experience in that position differ in their degree. Interpret the coefficients.# run the model.
years.degree.lhds <- ergm(lhds ~ edges + nodefactor("years"))
# check out the summary.
summary(years.degree.lhds)##
## ==========================
## Summary of model fit
## ==========================
##
## Formula: lhds ~ edges + nodefactor("years")
##
## Iterations: 8 out of 20
##
## Monte Carlo MLE Results:
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -6.12630 0.06277 0 -97.598 <1e-04 ***
## nodefactor.years.1 0.13571 0.04506 0 3.012 0.0026 **
## nodefactor.years.2 0.26097 0.04209 0 6.201 <1e-04 ***
## nodefactor.years.3 0.32162 0.03981 0 8.080 <1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 1140093 on 822403 degrees of freedom
## Residual Deviance: 36288 on 822399 degrees of freedom
##
## AIC: 36296 BIC: 36342 (Smaller is better.)
# Take a look at the values for the years attributes.
table(lhds%v%"years")##
## 0 1 2 3
## 244 268 325 419
Note that the term adds one network statistic to the model for each categorical value of the variable we have passed. But, the first category is excluded as the reference category. The value of 0 for years in position is excluded from the model, so individuals with 0 years experience in that position is the referent category. For the nodefactor term, positive values indicate that a node with that particular value of the attribute is more likely to have an edge, relative to the referent category. Alternatively, a negative value indicates that a node with that particular value of the attribute is less likely to have an edge, relative to the referent category.
Let’s look at this for our example. Looking at the table we see:
the term nodefactor.years.1 has a value of 0.13571, indicating that individuals who have 1 year experience are more likely to have edges, compared to those with 0.
the term nodefactor.years.2 has a value of 0.26097, indicating that individuals who have 2 years experience are more likely to have edges, compared to those with 0.
the term nodefactor.years.3 has a value of 0.32162, indicating that individuals who have 3 years experience are more likely to have edges, compared to those with 0.
Recall that in the general formulation of the ERGM, the \(\delta\) represents the change statistic, or the change in the statistic of interest when an edge is added (i.e. \(Y_{ij}\)) goes from 0 to 1). The change statistic for the nodefactor term is different from the edges term we saw above. If the predictor is categorical, the value of the change statistic is 0, 1 or 2. Specifically:
If neither of the nodes have the characteristic of interest, the change statistic is 0.
A value of 1 indicates that one of the nodes in the dyad has the characteristic.
A value of 2 indicates that both of the nodes in the dyad have the characteristic.
Again, let’s use the calculation that is usual for a logistic regression to interpret the coefficient:
\(\frac{1}{1 + e^{-(\theta_1X_1)}}\)
Using the coefficient of 0.13571, the predicted probability of a tie between two nodes with 1 year of experiences as a leader is:
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{years} \times \delta_{years})\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic((-6.12630 \times 1) + (0.13571 \times 2))\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.12630 + 0.27142)\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-5.85488) = 0.00285769\)
The value of 0.003 is returned by using the command plogis().
plogis(years.degree.lhds$coef[1]*1 + years.degree.lhds$coef[2]*2)## edges
## 0.002857662
Note that this is a fairly small effect. That is, based on the model, nodes who are both have 1 year of experience have a 0.20% of being connected. But, the probability of a tie between two nodes whom both have 0 years experience is just the coefficient for the edges term:
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges})\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.12630 \times 1)\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.12630) = 0.002179878\)
The value of 0.002 is returned by using the command plogis().
plogis(years.degree.lhds$coef[1]*1)## edges
## 0.002179878
What is the probability of a tie between two nodes where one node has 1 year of experience and the other node has 3 years of experience? We need the coefficients for each year for this. In this case, we will multiply the change statistic by 1 for each coefficient:
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{years} \times \delta_{years} )\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic((-6.12630 \times 1) + (0.13571 \times 1) + (0.3216247 \times 1))\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.12630 + 0.13571 + 0.3216247)\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-5.6689658) = 0.003439564\)
The value of 0.003 is returned by using the command plogis().
plogis(years.degree.lhds$coef[1]*1 + years.degree.lhds$coef[2]*1 + years.degree.lhds$coef[4]*1)## edges
## 0.00343954
Nodes where ego has 1 year of experience and alter has 2 years of experience have a 0.3% of being connected.
hivscreen. What is the interpretation of this matrix?mixingmatrix(lhds,"hivscreen")## Note: Marginal totals can be misleading
## for undirected mixing matrices.
## N Y
## N 526 632
## Y 632 1498
The mixingmatrix() function shows the pattern of ties based on the hivscreen variable. We can see that among those who say Yes to having an HIV screen, it appears that there are more ties compared to those relations where ego says Yes and alter says No (and visa versa).
hivscreen have a preference for communicating with departments that also provide an hivscreen.# run the model.
hiv.homophily.lhds <- ergm(lhds ~ edges + nodematch("hivscreen"))
# check out the summary.
summary(hiv.homophily.lhds)##
## ==========================
## Summary of model fit
## ==========================
##
## Formula: lhds ~ edges + nodematch("hivscreen")
##
## Iterations: 8 out of 20
##
## Monte Carlo MLE Results:
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -6.35331 0.03827 0 -166.02 <1e-04 ***
## nodematch.hivscreen 1.00204 0.04428 0 22.63 <1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 1140093 on 822403 degrees of freedom
## Residual Deviance: 35780 on 822401 degrees of freedom
##
## AIC: 35784 BIC: 35807 (Smaller is better.)
hivscreen?Note that the term adds one network statistic to the model for the variable. For the nodematch term, positive values indicate that for a pair of nodes with the same attribute value, an edge is more likely. Alternatively, a negative value indicates that for a pair of nodes with the same attribute value, an edge is less likely. Thus, positive coefficients indicate the presence of homophily and negative coefficients indicate the presence of heterophily.
Let’s work through our example to see the interpretation of the coefficient. Looking at the table we see:
nodematch.hivscreen has a value of 1.00204, indicating that ties are more likely among nodes with the same value for the variable hivscreen.Recall that in the general formulation of the ERGM, the \(\delta\) represents the change statistic, or the change in the statistic of interest when an edge is added (i.e. \(Y_{ij}\)) goes from 0 to 1). The change statistic for the nodematch term is similar to the edges term we saw above. If the predictor is categorical, the value of the change statistic is 0 or 1. Specifically:
If i and j have the same value for a categorical covariate the change statistic is 1.
If i and j do not have the same value for a categorical covariate the change statistic is 0.
Again, let’s use the calculation that is usual for a logistic regression to interpret the coefficient:
\(\frac{1}{1 + e^{-(\theta_1X_1)}}\)
Using the coefficient of 1.00204, the predicted probability of an edge between nodes with the same hivscreen designation is:
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{hivscreen.homophily} \times \delta_{hivscreen.homophily})\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.35331 \times 1 + 1.00204 \times 1)\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.35331 + 1.00204)\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-5.35127) = 0.004719743\)
The value of 0.005 is returned by using the command plogis().
Note that this is a fairly small effect. That is, based on the model, nodes with the same hivscreen designation have an 0.4% chance of being connected. But, ties are pretty rare in the network anyway, so we need to think about this relative to the effect sizes we have already seen.
plogis(hiv.homophily.lhds$coef[1]*1 + hiv.homophily.lhds$coef[2]*1)## edges
## 0.004719753
hivscreen?This would be the same, because the relationship is still homophilous.
Note, however, that we could also estimate differential homophily terms, allowing the effect of homophily to vary by “Yes” or “No” regarding the hivscreen attribute by setting the diff= argument to TRUE in the ergm() function.
# run the model with the diff=TRUE argument.
hiv.diff.homophily.lhds <- ergm(lhds ~ edges + nodematch("hivscreen", diff=TRUE))
# check out the summary.
summary(hiv.diff.homophily.lhds)##
## ==========================
## Summary of model fit
## ==========================
##
## Formula: lhds ~ edges + nodematch("hivscreen", diff = TRUE)
##
## Iterations: 8 out of 20
##
## Monte Carlo MLE Results:
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -6.35331 0.03827 0 -166.02 <1e-04 ***
## nodematch.hivscreen.N 1.05211 0.05810 0 18.11 <1e-04 ***
## nodematch.hivscreen.Y 0.98504 0.04621 0 21.32 <1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 1140093 on 822403 degrees of freedom
## Residual Deviance: 35778 on 822400 degrees of freedom
##
## AIC: 35784 BIC: 35819 (Smaller is better.)
This returns two coefficients for hivscreen. Using the coefficient of 1.05211, the predicted probability of an edge between nodes who do not provide an hivscreen is:
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{hivscreen.homophily} \times \delta_{hivscreen.homophily})\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.35331 \times 1 + 1.05211 \times 1)\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.35331 + 1.05211)\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-5.3012) = 0.004960875\)
The value of 0.005 is returned by using the command plogis().
plogis(hiv.diff.homophily.lhds$coef[1]*1 + hiv.diff.homophily.lhds$coef[2]*1)## edges
## 0.00496086
Now for the Yes responses. Using the coefficient of 0.98504, the predicted probability of an edge between nodes who do provide an hivscreen designation is:
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{hivscreen.homophily} \times \delta_{hivscreen.homophily})\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.35331 \times 1 + 0.98504 \times 1)\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.35331 + 0.98504)\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-5.36827) = 0.004640555\)
The value of 0.005 is returned by using the command plogis().
plogis(hiv.diff.homophily.lhds$coef[1]*1 + hiv.diff.homophily.lhds$coef[3]*1)## edges
## 0.004640558
So, the model shows that, although there is homophily based on hivscreen, it differs based on “Yes” and “No”. Specifically, homophily is stronger among the “No” departments.
Note that the differential homophily model is nested within the uniform homophily model. We could determine whether there is statistical evidence of an effect of differential homophily, compared to uniform homophily, by conducting a \(\chi^2\) test for the difference in the deviance (note that this comparison has 1 degree of freedom). We fail to reject that test. Also, comparison of the AIC and BIC show that the uniform homophily model is preferred to the differential homophily model.
hivscreen lead to the same conclusion as the ERG model for hivscreen?No. The model supports our conclusion that there is homophily based on hivscreen.
nutrition. What is the interpretation of this matrix?The mixingmatrix() function shows the pattern of ties based on the nutrition variable. We can see that among those who both say Yes to having nutrition programming, it appears that there are more ties compared to those relations where ego says Yes and alter says No (and visa versa). Note however, that there are more off-diagonal ties (i.e. Yes/No and No/Yes) than there are ties among those who both say No to having a nutrition program. It looks like there is homophily for the Yes nodes, but not for the No nodes. To evaluate this, we want to take into account the actual distribution of Yes and No responses in the network. So, the ergm() will do this for us.
mixingmatrix(lhds,"nutrition")## Note: Marginal totals can be misleading
## for undirected mixing matrices.
## N Y
## N 216 648
## Y 648 1812
nutrition programs have a preference for communicating with departments that also provide nutrition programs.# run the model.
nutrition.homophily.lhds <- ergm(lhds ~ edges + nodematch("nutrition"))
# check out the summary.
summary(nutrition.homophily.lhds)##
## ==========================
## Summary of model fit
## ==========================
##
## Formula: lhds ~ edges + nodematch("nutrition")
##
## Iterations: 8 out of 20
##
## Monte Carlo MLE Results:
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -6.17403 0.03839 0 -160.83 <1e-04 ***
## nodematch.nutrition 0.68013 0.04437 0 15.33 <1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 1140093 on 822403 degrees of freedom
## Residual Deviance: 36106 on 822401 degrees of freedom
##
## AIC: 36110 BIC: 36133 (Smaller is better.)
nutrition programs?Let’s work through our example to see the interpretation of the coefficient. Looking at the table we see:
nodematch.nutrition has a value of 0.68013, indicating that ties are more likely among nodes with the same value for the variable nutrition.Recall that in the general formulation of the ERGM, the \(\delta\) represents the change statistic, or the change in the statistic of interest when an edge is added (i.e. \(Y_{ij}\)) goes from 0 to 1). The change statistic for the nodematch term is similar to the edges term we saw above. If the predictor is categorical, the value of the change statistic is 0 or 1. Specifically:
If i and j have the same value for a categorical covariate the change statistic is 1.
If i and j do not have the same value for a categorical covariate the change statistic is 0.
Again, let’s use the calculation that is usual for a logistic regression to interpret the coefficient:
\(\frac{1}{1 + e^{-(\theta_1X_1)}}\)
Using the coefficient of 0.68013, the predicted probability of an edge between nodes with the same nutrition designation is:
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{nutrition.homophily} \times \delta_{nutrition.homophily})\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.17403 \times 1 + 0.68013 \times 1)\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.17403 + 0.68013)\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-5.4939) = 0.004094939\)
The value of 0.004 is returned by using the command plogis().
Note that this is a fairly small effect. That is, based on the model, nodes with the same nutrition designation have an 0.4% chance of being connected. But, ties are pretty rare in the network anyway, so we need to think about this relative to the effect sizes we have already seen.
plogis(nutrition.homophily.lhds$coef[1]*1 + nutrition.homophily.lhds$coef[2]*1)## edges
## 0.004094943
nutrition program?This would be the same, because the relationship is still homophilous.
Note, however, that we could also estimate differential homophily terms, allowing the effect of homophily to vary by “Yes” or “No” regarding the nutrition attribute by setting the diff= argument to TRUE in the ergm() function.
# run the model with the diff=TRUE argument.
nutrition.diff.homophily.lhds <- ergm(lhds ~ edges + nodematch("nutrition", diff=TRUE))
# check out the summary.
summary(nutrition.diff.homophily.lhds)##
## ==========================
## Summary of model fit
## ==========================
##
## Formula: lhds ~ edges + nodematch("nutrition", diff = TRUE)
##
## Iterations: 8 out of 20
##
## Monte Carlo MLE Results:
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -6.17403 0.03839 0 -160.833 <1e-04 ***
## nodematch.nutrition.N 0.67581 0.07824 0 8.637 <1e-04 ***
## nodematch.nutrition.Y 0.68064 0.04503 0 15.115 <1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 1140093 on 822403 degrees of freedom
## Residual Deviance: 36106 on 822400 degrees of freedom
##
## AIC: 36112 BIC: 36147 (Smaller is better.)
This returns two coefficients for nutrition. Using the coefficient of 0.67581, the predicted probability of an edge between nodes who do not provide a nutrition program is:
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{nutrition.homophily} \times \delta_{nutrition.homophily}\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.17403 \times 1 + 0.67581 \times 1)\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.17403 + 0.67581)\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-5.49822) = 0.004077359\)
The value of 0.004 is returned by using the command plogis(). Note: this is a bit off due to rounding in the ergm() summary.
plogis(nutrition.diff.homophily.lhds$coef[1]*1 + nutrition.diff.homophily.lhds$coef[2]*1)## edges
## 0.004077395
Now for the Yes responses. Using the coefficient of 0.68064, the predicted probability of an edge between nodes who do provide an nutrition program is:
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{nutrition.homophily} \times \delta_{nutrition.homophily})\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.17403 \times 1 + 0.68064 \times 1)\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-6.17403 + 0.68064)\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-5.49339) = 0.00409702\)
The value of 0.00409702 is returned by using the command plogis().
plogis(nutrition.diff.homophily.lhds$coef[1]*1 + nutrition.diff.homophily.lhds$coef[3]*1)## edges
## 0.004097045
So, the model shows that, although there is homophily based on nutrition, it differs based on “Yes” and “No”, but it is a pretty small difference. Specifically, homophily is stronger among the “Yes” departments. But, again, it is a small difference.
Note that the differential homophily model is nested within the uniform homophily model. We could determine whether there is statistical evidence of an effect of differential homophily, compared to uniform homophily, by conducting a \(\chi^2\) test for the difference in the deviance (note that this comparison has 1 degree of freedom). We fail to reject that test. Also, comparison of the AIC and BIC show that the uniform homophily model is preferred to the differential homophily model.
nutrition lead to the same conclusion as the ERG model for nutrition?No. The model supports our conclusion that there is homophily based on nutrition. In fact, it shows us also that our initial belief that there was not homophily on the “No” responses was misguided as the mixingmatrix didn’t account for the fact that there are three times as many “Yes” responses as there are “No” responses.
hivscreen and nutrition. Interpret the coefficients.# run the model.
hiv.nutrition.homophily.lhds <- ergm(lhds ~ edges + nodematch("hivscreen") + nodematch("nutrition"))
# check out the summary.
summary(hiv.nutrition.homophily.lhds)##
## ==========================
## Summary of model fit
## ==========================
##
## Formula: lhds ~ edges + nodematch("hivscreen") + nodematch("nutrition")
##
## Iterations: 8 out of 20
##
## Monte Carlo MLE Results:
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -6.69202 0.04851 0 -137.96 <1e-04 ***
## nodematch.hivscreen 0.93433 0.04455 0 20.97 <1e-04 ***
## nodematch.nutrition 0.56250 0.04465 0 12.60 <1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 1140093 on 822403 degrees of freedom
## Residual Deviance: 35608 on 822400 degrees of freedom
##
## AIC: 35614 BIC: 35649 (Smaller is better.)
hivscreen and nutrition.# Create the colors for hivscreen.
hiv.cols <- lhds%v%"hivscreen"
hiv.cols[hiv.cols == "Y"] <- "blue"
hiv.cols[hiv.cols == "N"] <- "red"
hiv.cols[is.na(hiv.cols) ==TRUE] <- "white"
# Create shapes for nutrition.
nutr.shp <- lhds%v%"nutrition"
nutr.shp[nutr.shp == "Y"] <- 4
nutr.shp[nutr.shp == "N"] <- 3
nutr.shp[is.na(nutr.shp) ==TRUE] <- 50
nutr.shp <- as.numeric(nutr.shp)
# let's plot the data and see what it looks like.
set.seed(605)
gplot(lhds, usearrows = FALSE, edge.col = "grey80", vertex.col=hiv.cols, vertex.sides=nutr.shp)
title("Communication network for 1,283 health departments",
sub="Nodes Colored by HIV Screen (Blue = Y, Red = N, White = NA)\n
Nodes Shaped by Nutrition Program (Square = Y, Triangle = N, Circle = NA)"
)In the visualization, do you see the homophily that we observed in the prior model?
hivscreen and nutrition ERGM and compare it to the observed LDHS network.# Run the simulation.
sim.hiv.nutrition.homophily.lhds <- simulate.ergm(
hiv.nutrition.homophily.lhds, # we are going to use the estimates from this model for our simulation.
nsim=1, # simulate 1 network.
warning=FALSE, # suppress a warning that pops up when we use this function.
seed = 605 # set the seed so that it reproduces the same results.
)## Warning: You appear to be calling simulate.ergm() directly. simulate.ergm()
## is a method, and will not be exported in a future version of 'ergm'. Use
## simulate() instead, or getS3method() if absolutely necessary.
## Warning: You appear to be calling simulate.formula() directly.
## simulate.formula() is a method, and will not be exported in a future
## version of 'ergm'. Use simulate() instead, or getS3method() if absolutely
## necessary.
# Set up the colors and shapes for the simulated network.
sim.hiv.cols <- sim.hiv.nutrition.homophily.lhds%v%"hivscreen"
sim.hiv.cols[sim.hiv.cols == "Y"] <- "blue"
sim.hiv.cols[sim.hiv.cols == "N"] <- "red"
sim.hiv.cols[is.na(sim.hiv.cols) ==TRUE] <- "white"
# Create shapes for nutrition.
sim.nutr.shp <- sim.hiv.nutrition.homophily.lhds%v%"nutrition"
sim.nutr.shp[sim.nutr.shp == "Y"] <- 4
sim.nutr.shp[sim.nutr.shp == "N"] <- 3
sim.nutr.shp[is.na(sim.nutr.shp) ==TRUE] <- 50
sim.nutr.shp <- as.numeric(sim.nutr.shp)
# Now, plot both of the networks.
op <- par(mfrow=c(1,2), mai = c(0,0,0.7,0))
set.seed(605)
gplot(lhds, usearrows = FALSE, edge.col = "grey80", vertex.col=hiv.cols, vertex.sides=nutr.shp, main="Observed")
set.seed(605)
gplot(sim.hiv.nutrition.homophily.lhds, usearrows = FALSE, edge.col = "grey80", vertex.col=sim.hiv.cols, vertex.sides = sim.nutr.shp, main="Simulated")par(op)hivscreen and nutrition ERGM.hiv.nutrition.homophily.lhds.gof <- gof(
hiv.nutrition.homophily.lhds, # our estimated model.
verbose = TRUE, # we want it to tell us what it is doing while it is doing it.
control = control.gof.ergm(seed = 605) # set the seed to reproduce results, note the control= argument.
)
# Now, print out the goodness of fit info.
hiv.nutrition.homophily.lhds.gof##
## Goodness-of-fit for degree
##
## obs min mean max MC p-value
## 0 58 14 27.55 40 0.00
## 1 117 76 93.84 118 0.02
## 2 182 138 175.66 221 0.56
## 3 223 201 226.39 261 0.98
## 4 226 193 232.13 264 0.76
## 5 172 172 194.39 216 0.02
## 6 104 112 141.75 167 0.00
## 7 67 58 89.66 111 0.02
## 8 35 37 52.33 72 0.00
## 9 25 15 27.15 45 0.80
## 10 26 5 13.06 35 0.04
## 11 14 0 5.56 15 0.02
## 12 8 0 2.12 9 0.02
## 13 6 0 0.97 4 0.00
## 14 8 0 0.28 2 0.00
## 15 4 0 0.10 1 0.00
## 16 3 0 0.05 1 0.00
## 17 1 0 0.00 0 0.00
## 18 1 0 0.01 1 0.02
## 19 1 0 0.00 0 0.00
## 20 1 0 0.00 0 0.00
## 22 1 0 0.00 0 0.00
##
## Goodness-of-fit for edgewise shared partner
##
## obs min mean max MC p-value
## esp0 696 2544 2654.19 2794 0
## esp1 750 18 46.47 73 0
## esp2 630 0 0.39 2 0
## esp3 382 0 0.01 1 0
## esp4 156 0 0.00 0 0
## esp5 56 0 0.00 0 0
## esp6 25 0 0.00 0 0
## esp7 8 0 0.00 0 0
## esp8 3 0 0.00 0 0
## esp10 1 0 0.00 0 0
## esp11 1 0 0.00 0 0
##
## Goodness-of-fit for minimum geodesic distance
##
## obs min mean max MC p-value
## 1 2708 2580 2701.06 2842 0.82
## 2 7357 10568 11653.21 13019 0.00
## 3 15814 41653 47982.53 56076 0.00
## 4 32648 137207 159487.88 187419 0.00
## 5 63288 272237 292003.74 311877 0.00
## 6 95893 174732 203467.63 227626 0.00
## 7 111460 38621 57206.80 75252 0.00
## 8 100239 4139 9405.56 15591 0.00
## 9 73092 154 1233.70 3187 0.00
## 10 43767 1 157.52 1137 0.00
## 11 21752 0 19.14 395 0.00
## 12 10004 0 1.70 62 0.00
## 13 4761 0 0.20 10 0.00
## 14 2369 0 0.02 1 0.00
## 15 1080 0 0.00 0 0.00
## 16 394 0 0.00 0 0.00
## 17 135 0 0.00 0 0.00
## 18 37 0 0.00 0 0.00
## 19 4 0 0.00 0 0.00
## Inf 235601 17857 37082.31 50500 0.00
##
## Goodness-of-fit for model statistics
##
## obs min mean max MC p-value
## edges 2708 2580 2701.06 2842 0.82
## nodematch.hivscreen 2024 1900 2018.00 2128 0.92
## nodematch.nutrition 2028 1922 2026.03 2129 1.00
The default for gof() is to estimate the degree distribution, the edgewise shared partner distribution, and the geodesic distance distribution. Inspection of each one tells us about how well the model we have constructed is representing the processes that generated the observed network. Specifically:
The Goodness-of-fit for degree table indicates that the model is underestimating the number of nodes with degree 0 and degree 1. The model does a decent job estimating degree 2-4, but for the most part, the rest of the degree distribution is not well represented by our model.
The Goodness-of-fit for edgewise shared partner table shows that all of the edgewise shared partner terms are not well represented in the model. In other words, our model is not capturing the transitivity in the network. Put differently, our model cannot generate networks that have transitivity level similar to our observed network.
The Goodness-of-fit for minimum geodesic distance both also indicates a poor fit of the model. In particular, the model overpredicts how close the nodes are in the graph. That is, nodes in the network have a longer average path distance from each other than our model can generate.
#Set up the plotting pane.
op <- par(mfrow=c(2,2))
plot(hiv.nutrition.homophily.lhds.gof)par(op)Inspection of the plots shows the same information that we observed in the tables.
ergm.terms page (use ?ergm.terms). For the LHDS network, identify two other network configurations that could explain the observed network. Estimate and interpret the models.Based on the results from the goodness-of-fit above, let’s try to account for the transitivity in the graph using the esp term. Specifically, let’s estimate an esp term for edgewise shared partners of 0 through 3. If you run this model: esp.lhds <- ergm(lhds ~ edges + nodematch("hivscreen") + nodematch("nutrition") + esp(0:3)) you will see that it is degenerate, meaning that MCMC algorithm is not working properly based on this configuration.
Let’s take a look at the gwesp term, which was designed to handle model degeneracy.
# take a look at the info for the term.
?gwesp
# estimate the model.
gwesp.lhds <- ergm(lhds ~ edges + nodematch("hivscreen") + nodematch("nutrition") + gwesp(fixed = TRUE))
# summarize the model.
summary(gwesp.lhds)##
## ==========================
## Summary of model fit
## ==========================
##
## Formula: lhds ~ edges + nodematch("hivscreen") + nodematch("nutrition") +
## gwesp(fixed = TRUE)
##
## Iterations: 20 out of 20
##
## Monte Carlo MLE Results:
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -7.63367 0.06286 1 -121.444 <1e-04 ***
## nodematch.hivscreen 0.61953 0.06222 1 9.958 <1e-04 ***
## nodematch.nutrition 0.33950 0.06183 1 5.491 <1e-04 ***
## gwesp.fixed.0 2.44541 0.04164 2 58.728 <1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 1140093 on 822403 degrees of freedom
## Residual Deviance: 31235 on 822399 degrees of freedom
##
## AIC: 31243 BIC: 31289 (Smaller is better.)
This model also struggles to find a solution (as seen by the lack of convergence). The next step would be to try and tune the estimation using the control.ergm function.
Finally, let’s try the gwdegree term. This takes into account the degree distribution.
# take a look at the info for the term.
?gwdegree
# estimate the model.
gwdegree.lhds <- ergm(lhds ~ edges + nodematch("hivscreen") + nodematch("nutrition") + gwdegree(fixed = TRUE))
# summarize the model.
summary(gwdegree.lhds)##
## ==========================
## Summary of model fit
## ==========================
##
## Formula: lhds ~ edges + nodematch("hivscreen") + nodematch("nutrition") +
## gwdegree(fixed = TRUE)
##
## Iterations: 2 out of 20
##
## Monte Carlo MLE Results:
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -6.61574 0.04977 0 -132.938 <1e-04 ***
## nodematch.hivscreen 0.92391 0.04292 0 21.524 <1e-04 ***
## nodematch.nutrition 0.53405 0.04391 0 12.162 <1e-04 ***
## gwdeg.fixed.0 -0.91982 0.14842 0 -6.197 <1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 1140093 on 822403 degrees of freedom
## Residual Deviance: 35576 on 822399 degrees of freedom
##
## AIC: 35584 BIC: 35631 (Smaller is better.)
That worked! Now, let’s take a look at the model fit.
gwdegree.lhds.gof <- gof(
gwdegree.lhds,
GOF = ~degree # the degree distribution,
+ espartners # the edge-wise shared partners distribution,
+ distance, # our estimated model.
verbose = TRUE, # we want it to tell us what it is doing while it is doing it.
control = control.gof.ergm(seed = 605) # set the seed to reproduce results, note the control= argument.
)
#Set up the plotting pane.
op <- par(mfrow=c(2,2))
plot(gwdegree.lhds.gof)par(op)Evaluation of the goodness-of-fit shows that we are now fitting the degree distribution better than before. That makes sense given that we included a term that models the probability of various degree, as following a geometric distribution. Although, we are still struggling with the edgewise shared partner distribution and the geodesic distance distribution.
The Talked About the Course network, stored as the object talk.course.net, which is an object of class network, is available at: https://www.jacobtnyoung.com/uploads/2/3/4/5/23459640/crj_605_networks_spring_2019_class_survey_data_w1.rdata.
Do the following:
Sleept1 of experience.# clear the workspace.
rm(list = ls())
# Load the data.
load(url("https://www.jacobtnyoung.com/uploads/2/3/4/5/23459640/crj_605_networks_spring_2019_class_survey_data_w1.rdata"))
# Take a look at the talked about the course network.
summary(talk.course.net,print.adj = FALSE)## Network attributes:
## vertices = 15
## directed = TRUE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges = 59
## missing edges = 0
## non-missing edges = 59
## density = 0.2809524
##
## Vertex attributes:
##
## GenTrustt1:
## integer valued attribute
## 15 values
##
## KnowNett1:
## integer valued attribute
## 15 values
##
## Mealst1:
## integer valued attribute
## 15 values
##
## Musict1:
## character valued attribute
## attribute summary:
## the 10 most common values are:
## Country Folk Indie 80's Alternative
## 2 2 2 1
## AH. Rock Alternative Classic Jazz
## 1 1 1 1
## Pop Rap
## 1 1
##
## Sleept1:
## integer valued attribute
## 15 values
##
## Studyt1:
## integer valued attribute
## 15 values
##
## TV1t1:
## character valued attribute
## attribute summary:
## the 10 most common values are:
## Brooklyn 99 Parks and Rec
## 2 2
## The Office Baby Boss
## 2 1
## Dynasty Fox and Friends
## 1 1
## Lethal Weapon Manhunt: Unabomber
## 1 1
## Star Trek Star Trek: Next Generation
## 1 1
##
## TV2t1:
## character valued attribute
## attribute summary:
## the 10 most common values are:
## Atypical Brooklyn 99 Catastrophe Friends
## 1 1 1 1
## Ghost Adventures Jeopardy Motive Plantet Earth
## 1 1 1 1
## Queer Eye The Crown
## 1 1
##
## TV3t1:
## character valued attribute
## attribute summary:
## the 10 most common values are:
## America's Newsroom Big Bang Theory
## 1 1
## Blue Planet 2 Brooklyn 99
## 1 1
## Fox News Game Face
## 1 1
## Great British Baking Show Grey's Anatomy
## 1 1
## Happy Insecure
## 1 1
## vertex.names:
## character valued attribute
## 15 valid vertex names
##
## Edge attributes:
##
## spend.time.net:
## numeric valued attribute
## attribute summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.3559 1.0000 1.0000
##
## trust.net:
## numeric valued attribute
## attribute summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 1.0000 0.5254 1.0000 1.0000
# Get some colors to use.
library(RColorBrewer)
mycols<-brewer.pal(11,"Spectral")
#display.brewer.pal(11,"Spectral") # I like the 4th one for the colors.
# lets plot the data and see what it looks like.
set.seed(605)
gplot(talk.course.net, edge.col = "grey80", vertex.col=mycols[4])
title("Talked about the Course Network")# define a few terms.
g <- dim(as.matrix(talk.course.net))[1]
l <- sum(as.matrix(talk.course.net))/2
den <- l / (g*(g-1)) # note that here we do not divide by 2 because the network is directed.
# generate the random graph.
set.seed(605)
random.graph <- rgraph(
g, # the number of nodes in talk.course.net.
1, # we want just 1 random graph.
tprob = den, # the density of talk.course.net.
mode = "digraph" )
random.net <- as.network(random.graph, directed = TRUE)
# Get some colors to use.
library(RColorBrewer)
mycols<-brewer.pal(11,"Spectral")
#display.brewer.pal(11,"Spectral") # I like the 4th and 10th one for the colors.
# Now, take a look at both graphs.
op <- par(mfrow=c(1,2), mai = c(0,0,0.7,0))
set.seed(605)
gplot(talk.course.net, edge.col = "grey80", vertex.col=mycols[4])
title("Talked about \nthe Course Network")
set.seed(605)
gplot(random.net, edge.col = "grey80", vertex.col=mycols[10])
title("Random network")par(op)The major difference is that there is no core/periphery structure in the random network. Also, there are no isolates in the random network. Interestingly, the random network has 56 edges, whereas the observed network has 59 edges. But, visually the random network looks to have more going on.
In looking at the indegree distributions, we can see that there is not a huge amount of difference for the talk.course.net network and random network. The only difference is that there are individuals receiving no ties (i.e. indegree = 0) in the talk.course.net network, but not the random network. For the outdegree distributions, we see the individuals we outdegree = 0 standing out for the talk.course.net network, whereas the random network does not have this structural feature.
The ERGM expresses the probability of observing a tie between nodes i and j given some terms (i.e. network configurations). The edge independence model of Erdos and Renyi (1959) uses a single term, the number of edges in the graph, to represent the probability of a tie. In the ergm package this is the edges term.
# run the model.
edge.indep.talk <- ergm(talk.course.net ~ edges)
# check out the summary.
summary(edge.indep.talk)##
## ==========================
## Summary of model fit
## ==========================
##
## Formula: talk.course.net ~ edges
##
## Iterations: 4 out of 20
##
## Monte Carlo MLE Results:
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -0.9397 0.1535 0 -6.121 <1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 291.1 on 210 degrees of freedom
## Residual Deviance: 249.4 on 209 degrees of freedom
##
## AIC: 251.4 BIC: 254.8 (Smaller is better.)
The value of -0.9397 indicates that the density of the network is far below .5 or 50%. Recall that an edges term of 0 would represent 50% or 0.5 density.
If we plug in the value of -0.9397 that is returned by the ergm() function, we get:
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges})\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-0.9397 \times 1)\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = \frac{1}{1 + e^{-(-0.9397 \times 1)}} = 0.2809609\)
Does the value 0.281 look familiar?
The value of 0.281 is returned by using the command plogis().
# we could write it out.
plogis(-0.9397*1)## [1] 0.2809609
# or, we could pull from the model.
plogis(edge.indep.talk$coef[1]*1)## edges
## 0.2809524
The coefficient of -0.9397 indicates that the probability of a tie is 0.281 for this model.
Sleept1 of experience.The correlation matrix indicates that individuals who sleep more are less likely to talk about the course with another person (i.e. outdegree) and are less likely to have someone talk about the course with them (i.e. indegree). Also, note from the correlation matrix that individuals who send more ties are more likely to receive ties.
# First, take a look at the values for the sleep attribute.
table(talk.course.net%v%"Sleept1")##
## 4 6 7 8
## 2 5 3 5
# Since the sleep variable is continuous, and not categorical, we want to use the nodecov term.
?nodecov
# run the model.
sleep.deg.talked <- ergm(talk.course.net ~ edges + nodecov("Sleept1"))
# check out the summary.
summary(sleep.deg.talked)##
## ==========================
## Summary of model fit
## ==========================
##
## Formula: talk.course.net ~ edges + nodecov("Sleept1")
##
## Iterations: 4 out of 20
##
## Monte Carlo MLE Results:
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges 4.45423 1.21655 0 3.661 0.000251 ***
## nodecov.Sleept1 -0.41671 0.09469 0 -4.401 < 1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 291.1 on 210 degrees of freedom
## Residual Deviance: 227.7 on 208 degrees of freedom
##
## AIC: 231.7 BIC: 238.4 (Smaller is better.)
Note that the nodecov term adds one network statistic to the model for the variable we have passed. For the nodecov term, positive values indicate that a node with that particular value of the attribute is more likely to send and/or receive a tie as the value of the attribute increases (note the difference in interpretation from nodefactor). Alternatively, a negative value indicates that a node with that particular value of the attribute is less likely to send and/or receive a tie as the value of the attribute increases.
Let’s look at this for our example. Looking at the table we see:
nodecov.Sleept1 has a value of -0.41671, indicating that individuals who sleep more are less likely to send/receive a tie, compared to those who sleep less.Recall that in the general formulation of the ERGM, the \(\delta\) represents the change statistic, or the change in the statistic of interest when an edge is added (i.e. \(Y_{ij}\)) goes from 0 to 1). The change statistic for the nodecov term is different from the nodefactor term we saw above. Since the predictor is continuous we would just plug in the value for how much change we want in the variable. This is similar to a typical regression where the interpretation of the \(\beta\) coefficient is the change in y for a unit change in the predictor.
Using the coefficient of -0.41671, the predicted probability of a tie being incident on a node with the mean level of sleep (i.e. mean(talk.course.net%v%"Sleept1")) of 6.6 hours is:
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{sleep} \times \delta_{sleep})\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic((4.45423 \times 1) + (-0.41671 \times 6.6))\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(4.45423 + -2.750286)\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(1.703944) = 0.8460491\)
The value of 0.846 is returned by using the command plogis().
plogis(sleep.deg.talked$coef[1]*1 + sleep.deg.talked$coef[2]*6.6)## edges
## 0.8460445
The nodecov term estimates an effect for an edge being incident on a node, but does not take into account the fact that individuals may differ in the receiving and sending of ties. To account for this difference, we can use the nodeicov and nodeocov terms to allow an attribute to influence individuals receiving of ties and sending of ties, respectively.
# run the model.
sleep.odeg.talked <- ergm(talk.course.net ~ edges + nodeocov("Sleept1"))
# check out the summary.
summary(sleep.odeg.talked)##
## ==========================
## Summary of model fit
## ==========================
##
## Formula: talk.course.net ~ edges + nodeocov("Sleept1")
##
## Iterations: 4 out of 20
##
## Monte Carlo MLE Results:
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges 0.7593 0.7605 0 0.998 0.3180
## nodeocov.Sleept1 -0.2611 0.1159 0 -2.252 0.0243 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 291.1 on 210 degrees of freedom
## Residual Deviance: 244.3 on 208 degrees of freedom
##
## AIC: 248.3 BIC: 255 (Smaller is better.)
Note that the nodeocov term adds one network statistic to the model for the variable we have passed. For the nodeocov term, positive values indicate that a node with that particular value of the attribute is more likely to send a tie as the value of the attribute increases (again, note the difference in interpretation from nodefactor and nodeofactor). Alternatively, a negative value indicates that a node with that particular value of the attribute is less likely to send a tie as the value of the attribute increases.
Let’s look at this for our example. Looking at the table we see:
the term nodeicov.Sleept1 has a value of -0.4694, indicating that individuals who sleep more are less likely to send a tie, compared to those who sleep less.
this is consistent with the negative correlation between Sleept1 and outdegree noted above.
Using the coefficient of -0.2611, the predicted probability of a node sending a tie with the mean level of sleep (i.e. mean(talk.course.net%v%"Sleept1")) of 6.6 hours is:
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{out.sleep} \times \delta_{out.sleep})\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic((0.7593 \times 1) + (-0.2611 \times 6.6))\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(0.7593 + -1.72326)\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-0.96396) = 0.276086\)
The value of 0.276 is returned by using the command plogis().
plogis(sleep.odeg.talked$coef[1]*1 + sleep.odeg.talked$coef[2]*6.6)## edges
## 0.2761008
This model would use the nodeicov term to capture variation in indegree based on sleep.
# run the model.
sleep.ideg.talked <- ergm(talk.course.net ~ edges + nodeicov("Sleept1"))
# check out the summary.
summary(sleep.ideg.talked)##
## ==========================
## Summary of model fit
## ==========================
##
## Formula: talk.course.net ~ edges + nodeicov("Sleept1")
##
## Iterations: 5 out of 20
##
## Monte Carlo MLE Results:
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges 2.0853 0.7788 0 2.678 0.007414 **
## nodeicov.Sleept1 -0.4694 0.1207 0 -3.889 0.000101 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 291.1 on 210 degrees of freedom
## Residual Deviance: 233.4 on 208 degrees of freedom
##
## AIC: 237.4 BIC: 244.1 (Smaller is better.)
Note that the nodeicov term adds one network statistic to the model for the variable we have passed. For the nodeicov term, positive values indicate that a node with that particular value of the attribute is more likely to receive a tie as the value of the attribute increases (again, note the difference in interpretation from nodefactor and nodeifactor). Alternatively, a negative value indicates that a node with that particular value of the attribute is less likely to receive a tie as the value of the attribute increases.
Let’s look at this for our example. Looking at the table we see:
the term nodeicov.Sleept1 has a value of -0.4694, indicating that individuals who sleep more are less likely to receive a tie, compared to those who sleep less.
this is consistent with the negative correlation between Sleept1 and indegree noted above.
Using the coefficient of -0.4694, the predicted probability of a node receiving a tie with the mean level of sleep (i.e. mean(talk.course.net%v%"Sleept1")) of 6.6 hours is:
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{in.sleep} \times \delta_{in.sleep})\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic((2.0853 \times 1) + (-0.4694 \times 6.6))\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(2.0853 + -3.09804)\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-1.01274) = 0.266444\)
The value of 0.266 is returned by using the command plogis().
plogis(sleep.ideg.talked$coef[1]*1 + sleep.ideg.talked$coef[2]*6.6)## edges
## 0.266422
For this model, we would use both the nodeicov and nodeocov terms for sleep.
# run the model.
sleep.i.o.deg.talked <- ergm(talk.course.net ~ edges + nodeicov("Sleept1") + nodeocov("Sleept1"))
# check out the summary.
summary(sleep.i.o.deg.talked)##
## ==========================
## Summary of model fit
## ==========================
##
## Formula: talk.course.net ~ edges + nodeicov("Sleept1") + nodeocov("Sleept1")
##
## Iterations: 4 out of 20
##
## Monte Carlo MLE Results:
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges 4.4598 1.2213 0 3.652 0.00026 ***
## nodeicov.Sleept1 -0.5104 0.1246 0 -4.096 < 1e-04 ***
## nodeocov.Sleept1 -0.3246 0.1228 0 -2.643 0.00822 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 291.1 on 210 degrees of freedom
## Residual Deviance: 226.3 on 207 degrees of freedom
##
## AIC: 232.3 BIC: 242.4 (Smaller is better.)
Let’s look at this for our example. Looking at the table we see:
the term nodeicov.Sleept1 has a value of -0.5104, indicating that individuals who sleep more are less likely to receive a tie, compared to those who sleep less.
the term nodeocov.Sleept1 has a value of -0.3246, indicating that individuals who sleep more are less likely to send a tie, compared to those who sleep less.
Using the coefficients of -0.5104 and -0.3246, the predicted probability of a node receiving and sending a tie with the mean level of sleep (i.e. mean(talk.course.net%v%"Sleept1")) of 6.6 hours is:
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{i.sleep} \times \delta_{i.sleep} + \theta_{o.sleep} \times \delta_{o.sleep})\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic((4.4598 \times 1) + (-0.5104 \times 6.6) + (-0.3246 \times 6.6))\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(4.459 + -3.36864 + -2.14236)\)
\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-1.052) = 0.2588412\)
The value of 0.259 is returned by using the command plogis().
plogis(sleep.i.o.deg.talked$coef[1]*1 + sleep.i.o.deg.talked$coef[2]*6.6 + sleep.i.o.deg.talked$coef[3]*6.6)## edges
## 0.2589018
Note that the model with the nodecov term is nested in the model that separates this into nodeicov and nodeocov. This means we can determine whether individuals differ in their indegree and outdegree based on sleep by comparing these models. If we fail to reject the hypothesis that the coefficients are equal, then we can conclude that the simpler nodecov model is preferable and that indegree and outdegree do not differ based on sleep. We can do this by examining the AIC and BIC of the models.
# The more complex model.
summary(sleep.i.o.deg.talked)##
## ==========================
## Summary of model fit
## ==========================
##
## Formula: talk.course.net ~ edges + nodeicov("Sleept1") + nodeocov("Sleept1")
##
## Iterations: 4 out of 20
##
## Monte Carlo MLE Results:
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges 4.4598 1.2213 0 3.652 0.00026 ***
## nodeicov.Sleept1 -0.5104 0.1246 0 -4.096 < 1e-04 ***
## nodeocov.Sleept1 -0.3246 0.1228 0 -2.643 0.00822 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 291.1 on 210 degrees of freedom
## Residual Deviance: 226.3 on 207 degrees of freedom
##
## AIC: 232.3 BIC: 242.4 (Smaller is better.)
# The simpler model.
summary(sleep.deg.talked)##
## ==========================
## Summary of model fit
## ==========================
##
## Formula: talk.course.net ~ edges + nodecov("Sleept1")
##
## Iterations: 4 out of 20
##
## Monte Carlo MLE Results:
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges 4.45423 1.21655 0 3.661 0.000251 ***
## nodecov.Sleept1 -0.41671 0.09469 0 -4.401 < 1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 291.1 on 210 degrees of freedom
## Residual Deviance: 227.7 on 208 degrees of freedom
##
## AIC: 231.7 BIC: 238.4 (Smaller is better.)
The simpler model shows a smaller AIC and BIC indicating that this model is preferred to the more complex model.